Phoneme-Based Transliteration of Foreign Names for OOV Problem

نویسندگان

  • Wei Gao
  • Kam-Fai Wong
  • Wai Lam
چکیده

One problem seriously affecting CLIR performance is the processing of queries with embedded foreign names. A proper noun dictionary is never complete rendering name translation from English to Chinese ineffective. One way to solve this problem is not to rely on a dictionary alone but to adopt automatic translation according to pronunciation similarities, i.e. to map phonemes comprising an English name to sound units (e.g. pinyin) of the corresponding Chinese name. This process is called transliteration. We present a statistical transliteration method for CLIR applications. An efficient algorithm for phoneme alignment is described. Unlike traditional rule-based approaches, our method is data-driven. So it is independent of dialect features in Chinese. In addition, it is different from other statistical approaches based on source-channel framework in that we adopt a direct transliteration model, i.e. the direction of probabilistic estimation is consistent with transliteration direction. We demonstrate comparable performance on accuracy to other systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phoneme-based Statistical Transliteration of Foreign Names for OOV Problem

Given a source language term, machine transliteration is to automatically generate the phonetic equivalents in a target language. It is useful in many cross language applications. Recently, there are increasing concerns about automatic transliteration, especially with languages with significant distinctions in their phonetic representations, e.g. English and Chinese. Despite many cross-language...

متن کامل

Language Independent Transliteration System Using Phrase-based SMT Approach on Substrings

Everyday the newswire introduce events from all over the world, highlighting new names of persons, locations and organizations with different origins. These names appear as Out of Vocabulary (OOV) words for Machine translation, cross lingual information retrieval, and many other NLP applications. One way to deal with OOV words is to transliterate the unknown words, that is, to render them in th...

متن کامل

Extracting English-Korean Transliteration Equivalence from Domain-Specific Dictionaries

Automatic translation knowledge acquisition or automatic bilingual dictionary construction has become an important first step for natural language applications such as machine translation and cross-language information retrieval. Transliterations are used to translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from E...

متن کامل

Learning to Find Transliteration on the Web

This prototype demonstrate a novel method for learning to find transliterations of proper nouns on the Web based on query expansion aimed at maximizing the probability of retrieving transliterations from existing search engines. Since the method we used involves learning the morphological relationships between names and their transliterations, we refer to this IR-based approach as morphological...

متن کامل

Optimizing Transliteration for Hindi/Marathi to English Using only Two Weights

Machine transliteration has received significant research attention in last two decades. It is observed that Hindi to English and Marathi to English named entity machine transliteration is comparably less studied. Currently, research work in this domain is carried out by using grapheme based statistical approaches. But, to achieve better accuracy for the transliteration, an adequate bilingual t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004